Capturing a phylogenetic tree when the number of character states varies with the number of leaves

نویسنده

  • Mike Steel
چکیده

We show that for any two values α, β ∈ (0, 1) for which α+β > 1 then there exists a value N so that for all n ≥ N , and any binary phylogenetic tree T on n leaves there exists a set of at most n characters that capture T , and for which each character has at most n states. Here ‘capture’ means that T is the only phylogenetic tree that is compatible with the characters. Our short proof of this combinatorial result is based on the probabilistic method. Suppose that k characters on n taxa captures some phylogenetic tree T (i.e. the characters are compatible with T and no other tree; thus T is a binary unrooted tree with the n taxa as its leaves). If each of the k characters has at most r states, then a simple and known inequality states that k must be at least (n − 3)/(r − 1) (Proposition 4.2 of [5]). Remarkably, this lower bound was recently shown [2] to be sharp for every value of r ≥ 2, provided that n ≥ nr, where nr is some (increasing) function of r. In this note, we consider how small k can be when r is allowed to depend on n. From [1, 3] it is known that k = 4 holds for a certain series rn for which n/rn = O(1), so we focus on the setting where both rn and n/rn tend to infinity with increasing n. We consider what happens when rn grows as a sub-linear function of n, by constraining rn to be less or equal to a n for α ∈ (0, 1), in which case the inequality k ≥ (n− 3)/(r− 1) implies that k must exceed n for β = 1−α, for n sufficiently large. The following result is independent of the result from [2] mentioned above, in the sense that neither result directly implies the other. Our short proof involves a simple application of the probabilistic method, the Chernoff bound, and a property of the random cluster model on trees established in [4]. Theorem 1. For any two values α, β ∈ (0, 1) for which α+β > 1 there exists a value N so that for all n ≥ N , and any unrooted binary phylogenetic tree T on n leaves there exists a set of at most ⌊n⌋ characters that capture T , and for which each character has at most rn = ⌊n ⌋ states. Proof. Let X denote the leaf set of T . Consider the random cluster model on T in which each edge of T is independently cut with probability pn = rn/4n, or left intact with probability 1−pn. This leads to a partition ofX corresponding to the equivalence relation that two leaves are related if and only if they lie in the same connected component of the resulting graph. We will regard such a partition as equivalent to a character (with the number of ‘states’ of the character being the number of blocks of the partition). Notice that limn→∞ pn = 0. Let Y denote the random number of edges of T that are cut. Then Y has a binomial distribution Y ∼ Bin(2n − 3, pn), which has mean μn = (2n − 3)pn = ( 1 2 − o(1))n . By a multiplicative form of the ‘Chernoff bound’ in probability theory, P(Y ≥ 2μn) ≤ exp(−4μn/3) and since rn > 2μn we obtain: (1) P(Y ≥ rn) ≤ exp(−4μn/3). The number of blocks of the partition of X induced by randomly cutting edges of T in this way is at most Y + 1. Thus, the probability that a character, generated by the random cluster model with pn value as specified, has strictly more than rn states is at most P(Y + 1 > rn) = P(Y ≥ rn) ≤ exp(−4μn/3), by (1). Thus if we generate a set Sn of ⌊n ⌋ such characters the probability that at least one of these characters has more than rn states is, by Boole’s inequality,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimizing Phylogenetic Number to find Good Evolutionary Trees

Inferring phylogenetic trees is a fundamental problem in computational biology. We present a new objective criterion, the phylogenetic number, for evaluating evolutionary trees for species deened by biomolecular sequences or other qualitative characters. The phylogenetic number of a tree T is the maximum number of times that any given character state arises in T. By contrast, the classical pars...

متن کامل

Direct Molecular Detection and Phylogenetic Tree Analysis of Gastrointestinal Protozoan Parasites (Giardia lamblia, Entamoeba histolytica, Cryptosporidium parvum) from Diarrhea Infection in Kut City of Iraq: A Short Communication

Background: The intestinal tract of human can be infected by protozoan parasites. In this short communication, the stool samples were collected from patients with diarrhea referred to Kut hospital, Iraq, and then the parasites (Giardia lamblia, Entamoeba histolytica, Cryptosporidium parvum) were considered for molecular identification. Methods: Stool samples were collected from 69 patients wit...

متن کامل

Combinatorial Scoring of Phylogenetic Networks

Construction of phylogenetic trees and networks for extant species from their characters represents one of the key problems in phylogenomics. While solution to this problem is not always uniquely defined and there exist multiple methods for tree/network construction, it becomes important to measure how well the constructed networks capture the given character relationship across the species. In...

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

A Polynomial-Time Algorithm for the Phylogeny Problem when the Number of Character States is Fixed

We present a polynomial-time algorithm for determining whether a set of species, described by the characters they exhibit, has a phylogenetic tree, assuming the maximum number of possible states for a character ix fixed. This solves an open problem posed by Kannan and Warnow. Our result should be contrasted with the proof by Steel and Bodlænder, Fellows, and Warnow that the phylogeny problem is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015